Topic Models with Logical Constraints on Words
نویسندگان
چکیده
This paper describes a simple method to achieve logical constraints on words for topic models based on a recently developed topic modeling framework with Dirichlet forest priors (LDA-DF). Logical constraints mean logical expressions of pairwise constraints, Must-links and Cannot-Links, used in the literature of constrained clustering. Our method can not only cover the original constraints of the existing work, but also allow us easily to add new customized constraints. We discuss the validity of our method by defining its asymptotic behaviors. We verify the effectiveness of our method with comparative studies on a synthetic corpus and interactive topic analysis on a real corpus.
منابع مشابه
یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کاملMaximum Entropy Language Modeling with Non-Local and Syntactic Dependencies
Standard N -gram language models exploit information only from the immediate past to predict the future word. To improve the performance of a language model, two di erent kinds of long-range dependence, the syntactic structure and the topic of sentences are taken into consideration. The likelihood of many words varies greatly with the topic of discussion and topics capture this di erence. Synta...
متن کاملEvaluating Vector-Space Models of Word Representation, or, The Unreasonable Effectiveness of Counting Words Near Other Words
Vector-space models of semantics represent words as continuously-valued vectors and measure similarity based on the distance or angle between those vectors. Such representations have become increasingly popular due to the recent development of methods that allow them to be efficiently estimated from very large amounts of data. However, the idea of relating similarity to distance in a spatial re...
متن کاملSmall-Variance Asymptotics for Bayesian Nonparametric Models with Constraints
The users often have additional knowledge when Bayesian nonparametric models (BNP) are employed, e.g. for clustering there may be prior knowledge that some of the data instances should be in the same cluster (must-link constraint) or in different clusters (cannot-link constraint), and similarly for topic modeling some words should be grouped together or separately because of an underlying seman...
متن کاملA maximum entropy language model integrating N-grams and topic dependencies for conversational speech recognition
A compact language model which incorporates local dependencies in the form of N-grams and long distance dependencies through dynamic topic conditional constraints is presented. These constraints are integrated using the maximum entropy principle. Issues in assigning a topic to a test utterance are investigated. Recognition results on the Switchboard corpus are presented showing that with a very...
متن کامل